Specimens at the Center: An Informatics Workflow and Toolkit for Specimen- Level Analysis of Public DNA Database Data

نویسندگان

  • Kasey K. Pham
  • Marlene Hahn
  • Kate Lueders
  • Bethany H. Brown
  • Leo P. Bruederle
  • Jeremy J. Bruhl
  • Kyong-Sook Chung
  • Nathan J. Derieg
  • Marcial Escudero
  • Bruce A. Ford
  • Sebastian Gebauer
  • Berit Gehrke
  • Matthias H. Hoffmann
  • Takuji Hoshino
  • Pedro Jiménez-Mejías
  • Jongduk Jung
  • Sangtae Kim
  • Modesto Luceño
  • Enrique Maguilla
  • Santiago Martín-Bravo
  • Robert F. C. Naczi
  • Anton A. Reznicek
  • Eric H. Roalson
  • David A. Simpson
  • Julian R. Starr
  • Tamara Villaverde
  • Marcia J. Waterway
  • Karen L. Wilson
  • Okihito Yano
  • Shuren Zhang
  • Andrew L. Hipp
چکیده

Major public DNA databases — NCBI GenBank, the DNA DataBank of Japan (DDBJ), and the European Molecular Biology Laboratory (EMBL) — are invaluable biodiversity libraries. Systematists and other biodiversity scientists commonly mine these databases for sequence data to use in phylogenetic studies, but such studies generally use only the taxonomic identity of the sequenced tissue, not the specimen identity. Thus studies that use DNA supermatrices to construct phylogenetic trees with species at the tips typically do not take advantage of the fact that for many individuals in the public DNA databases, several DNA regions have been sampled; and for many species, two or more individuals have been sampled. Thus these studies typically do not make full use of the multigene datasets in public DNA databases to test species coherence and select optimal sequences to represent a species. In this study, we introduce a set of tools developed in the R programming language to construct individual-based trees from NCBI GenBank data and present a set of trees for the genus Carex (Cyperaceae) constructed using these methods. For the more than 770 species for which we found sequence data, our approach recovered an average of 1.85 gene regions per specimen, up to seven for some specimens, and more than 450 species represented by two or more specimens. Depending on the subset of genes analyzed, we found up to 42% of species monophyletic. We introduce a simple tree statistic—the Taxonomic Disparity Index (TDI)—to assist in curating specimen-level datasets and provide code for selecting maximally informative (or, conversely, minimally misleading) sequences as species exemplars. While tailored to the Carex dataset, the approach and code presented in this paper can readily be generalized to constructing individual-level trees from large amounts of data for any species group. Keywords—Carex, Cyperaceae, phylogenetic workflow, specimen-level data, supermatrix, taxon disparity index (TDI). Specimen-level data are at the heart of revisionary taxonomy, but much synthetic work in systematics has focused on development of species-level tools for phylogenetics (e.g. supertree and supermatrix approaches, and gene tree – species tree reconciliation) and monography (e.g. Scratchpads [Smith et al. 2012] and Encyclopedia of Life [Parr et al. 2014]). In the collections community, great strides have been made in databasing, georeferencing, and aggregating specimen data

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lean six sigma process improvement in specimen receiving to improve stat chemistry turnaround times

  Objective: As a consequence of stat turnaround times (TATs) chronically exceeding 60 minutes, our laboratory was facing pressure to divert limited resources toward the implementation of an emergency department satellite laboratory. Peer-reviewed literature in clinical laboratory quality assurance and improvement indicates that between 60-70% of errors occur at the pre-analytical level.  Thus...

متن کامل

RAPID DETECTION OF MYCOBACTERIUM TUBERCULOSIS IN CLINICAL SPECIMENS BY POLYMERASE CHAIN REACTION

We investigated the use of DNA amplification by polymerase chain reaction (peR) for detection of Mycobacterium tuberculosis in 300 patients who were suspected of having pulmonary tuberculosis and compared the results with culture results which were performed in parallel with PCR. Two-thirds of each sample was processed for smear and culture by standard methods and one-third was prepared fo...

متن کامل

An investigation of neutron direct damages at energies of 0.1-2 MeV on the DNA molecules with atomic structure deduced using Geant4 toolkit

This study proposes a method to estimate RBE of fast neutrons using Monte Carlo simulations. This approach is based on the combination of an atomic resolution DNA geometrical model and Monte Carlo simulations for tracking particles. Atomic positions were extracted from the Protein Data Bank. The GEANT4 code was used for tracking the secondary particles generated by fast neutrons during their in...

متن کامل

Investigation of the direct DNA damages irradiated by protons of different energies using geant4-DNA toolkit

Background: The total yields of direct Single-Strand Breaks (SSBs) and Double-Strand Breaks (DSBs) in proton energies varying from 0.1 to 40 MeV were calculated. While other studies in this field have not used protons with energy less than 0.5 MeV, our results show interesting and complicated behavior of these protons. Materials and Methods: The simulation has been done using the Geant4-DNA too...

متن کامل

Quality of Outpatient Visits in Selected Public and Private Clinics in Tabriz City in 2018

Quality of Outpatient Visits in Selected Public and Private Clinics in Tabriz City in 2018 Raana Gholamzadeh Nikjoo1, Mobin Sokhanvar2, Khadijeh Motahari rad*3, mohamad taghi khodayari4 1Faculty member. School of Management & medical informatics, Iranian center of excellence in health management, health services management research center, Tabriz University of medical sciences, Tabriz, Iran. 2...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016